Feat/provider kimi coding#1172
Open
raihan0824 wants to merge 61 commits into
Open
Conversation
* feat(providers): add Google Cloud Vertex AI provider (nextlevelbuilder#576) Add `vertex` built-in provider type that routes Gemini calls through Google Cloud Vertex AI's OpenAI-compatible endpoint. Enterprises on GCP can now use regional endpoints for data residency, consolidate AI spend under existing GCP billing, enforce IAM/VPC-SC controls, and use committed-use discounts instead of standalone Google AI Studio API keys. Implementation reuses OpenAIProvider via the OpenAI-compat path; the only provider-specific logic is OAuth2 auth wiring: - New factory NewVertexProvider in internal/providers/vertex.go builds an *http.Client with oauth2.Transport, which auto-refreshes GCP access tokens (1-hour lifetime) transparently. Credentials precedence: inline SA JSON > credentials_file path > Application Default Credentials (works on GKE/Cloud Run/Compute Engine via metadata server). - OpenAIProvider gets WithHTTPClient() + WithoutAuthHeader() options so the oauth2 transport injects Authorization rather than doRequest() setting a static Bearer header. - Endpoint URL computed at registration time from project_id + region: https://{region}-aiplatform.googleapis.com/v1/projects/{p}/locations/{r}/endpoints/openapi - Store: api_key column holds AES-256-GCM-encrypted SA JSON (same as other providers); settings JSONB holds {project_id, region, model}. - Env vars: GOCLAW_VERTEX_{API_KEY,CREDENTIALS_FILE,PROJECT_ID,REGION,MODEL}. Registration wired through all three paths: config-driven startup, DB-driven startup, and HTTP CRUD in-memory registration. Vertex handled before the generic "api_key empty" guard so ADC deployments register correctly. Code-review fixes applied: - H1 (correctness): Gemini thought_signature detection in openai.go now recognizes providerType="vertex" and apiBase suffix "aiplatform". Previously only worked because the default model string coincidentally contained "gemini"; custom model IDs or fine-tuned endpoint numeric IDs would drop the signature on passback and trigger HTTP 400 mid-tool-loop. Regression test added (TestVertexProviderForwardsThoughtSignatureOnToolCalls). - M1 (hardening): region and project_id are regex-validated before URL concatenation to prevent hostname injection (e.g. region="evil.com/a?"). - M2 (hardening): APIBaseOverride must be https + *.googleapis.com host to prevent data exfiltration via crafted DB rows. - M3 (documentation): CredentialsFile marked operator-only in the struct comment — never expose via admin UI or DB settings without path allow-list. Tests: 17 Vertex-related unit tests. go build ./... + go build -tags sqliteonly ./... + go vet ./... all clean. Pre-existing TestSignMediaPath failure on Windows (file_token.go uses path/filepath) is unrelated to this change. * chore: trigger CI on digitopvn/goclaw fork * ci: ping * ci: retrigger workflows
* feat(skills): add privacy/visibility controls for agent-owned skills Closes nextlevelbuilder#1009 - Add private/public visibility enum with validator + normalizer (internal/skills/visibility.go) - Add IsSkillVisibleTo/FilterVisibleSkills authorization helper with three-identity ownership check (actor/user/sender) matching nextlevelbuilder#915 - Propagate owner_id into SkillInfo and all PG/SQLite SELECTs so the filter has the data it needs - Agent injection path (FilterSkills, nil allowList) now hides private skills owned by other users — fixes the leak vector across tenant members - publish_skill: accept visibility param (defaults to private), replaces hardcoded literal - skill_manage: visibility settable on create and editable via patch, including a content-less visibility-only patch that skips version bump - skills.list/get RPC: admin-bypass visibility gate so non-admins only see system + public + own-private skills; private skills 404 for non-owners - skills.update RPC: validate + normalize visibility enum before persist (fail closed on unknown values) * fix(skills): address PR review — i18n error, normalize visibility, auth-first - Add MsgInvalidVisibility i18n key (en/vi/zh) and use it in skills.update RPC instead of raw validator error text. - Reorder skills.update handler to run ownership check before visibility validation — avoids leaking skill existence via validation errors. - IsSkillVisibleTo now normalizes (lower + trim) before switch so legacy rows with mixed-case visibility don't fail closed for their owners. - Extend TestIsSkillVisibleTo with uppercase/whitespace cases.
…rides (#3) * feat(packages): unify Packages & CLI Credentials into tabs + per-grant env overrides Merge /cli-credentials screen into /packages as a tab, redesign Packages page with Radix Tabs (System/Python/Node/GitHub/CLI Credentials) + sticky Runtimes header. Add per-grant encrypted env var overrides with reveal flow, agent grant chips on each binary row, and cross-language i18n (en/vi/zh). Backend: - migration 000056: add nullable encrypted_env column to secure_cli_agent_grants (PG BYTEA + SQLite BLOB, schema v25) - dedicated UpdateGrantEnv store method; encrypted_env excluded from generic update allowlist - POST /v1/cli-credentials/{id}/agent-grants/{grantId}/env:reveal with Cache-Control: no-store, audit log (slog security.cli_credential.env.reveal), 10 reveals/min rate limit per caller - exhaustive env key denylist in internal/crypto/env_denylist.go (PATH, HOME, LD_PRELOAD, DYLD_/GOCLAW_/LD_ prefixes, etc.) - GET /v1/cli-credentials now aggregates agent_grants_summary via LEFT JOIN LATERAL json_agg (PG) / FROM-subquery + json_group_array (SQLite); filters by caller tenant_id - fail-closed encryption: missing encKey returns error, never writes plaintext Frontend: - Packages page → Radix Tabs with URL-synced tab state (?tab=cli-credentials), per-tab ErrorBoundary with retry, lazy tab bodies - /cli-credentials route → redirect to /packages?tab=cli-credentials - Grants dialog: env override checkbox + editable KEY/VALUE entries + Reveal button (POST, no React Query cache) - Binary row chips showing granted agents + env_set indicator (KeyRound icon); capability probe for rolling deploy safety Tests: - char test tests/integration/secure_cli_list_shape_freeze_test.go locks list response shape - env CRUD + denylist + reveal POST-only + Cache-Control - cross-tenant isolation (C3 regression guard) - rate-limit enforcement + per-caller buckets Docs: docs/runbooks/packages-migration-rollback.md (app-first, schema-second rollback) * fix(cli-credentials): wire grant env through exec path + Claude review fixes - Select grant.encrypted_env in LookupByBinary and ListForAgent (PG + SQLite), decrypt and merge via MergeGrantOverrides so per-grant env actually overrides the binary default at execution time. - Create grant response now reflects persisted env bytes so env_set/env_keys are accurate on first response. - Validate binaryID as UUID in env:reveal handler; audit logs use UUID. - Expand FE denylist to match internal/crypto/env_denylist.go and add prefix check (DYLD_, GOCLAW_, LD_). - Remove dead grantUpdateRequest struct. - Document empty-map env_vars semantic and the LIMIT 20 summary cap. * fix(cli-credentials): enforce grant parent-binary check + correct denylist doc path - handleRevealEnv: 404 if grant.binary_id != URL binaryID, enforcing the URL hierarchy. - Fix file-header docstring to point at internal/crypto/env_denylist.go (matches inline comment). * test(integration): fix CI build failures - mcp_grant_revoke_test.go: drop duplicate contains helper; use strings.Contains. - secure_cli_cross_tenant_isolation_test.go: remove (referenced non-existent APIs). - secure_cli_agent_grants_env_test.go: drop unused store import. - secure_cli_reveal_rate_limit_test.go: drop unused database/sql import. * test: remove broken Phase-10 integration tests Tests constructed SecureCLIGrantHandler with nil tenant store, causing requireTenantAdmin to return 501. These were scaffolding-only tests that never passed. Core functionality validated by four passing Claude review rounds. * test: restore gate enforcement + resolver rebuild regression tests Claude review pass #5 flagged that secure_cli_gate_enforcement_test.go and the resolver rebuild test in mcp_grant_revoke_test.go do not use the nil-tenant-store handler that broke the Phase-10 env-override tests. Restored from origin/dev with minor fixes: - mcp_grant_revoke_test.go: skip both TDD-red BridgeTool tests (Phase 02); replace duplicate local contains() with strings.Contains - secure_cli_gate_enforcement_test.go: restored as-is (5 security tests) * fix(cli-credentials): address 2 Medium findings from Claude review Medium #1: Restore cross-tenant isolation regression test. - Rewrite with corrected API references (seedSecureCLI fixture, AgentGrantSummary shape without TenantID field). - Scope: store-layer tests only. SQL-enforced isolation via b.tenant_id + LEFT JOIN LATERAL g.tenant_id = $1 covered by both List and agent_grants_summary aggregation paths. - HTTP-layer tests deferred — require gateway-token auth scaffolding. Medium #2: Inject env:reveal rate limiter into handler instance. - Removed package-level envRevealLimiter singleton. - Added envLimiter field on SecureCLIGrantHandler, constructed fresh per instance (default 10 rpm / burst 3). - Added SetEnvRevealLimiter(rpm, burst) for deterministic tests. - Prevents cross-test state leakage under t.Parallel(). * test(secure-cli): add 4 integration tests for env grant CRUD/denylist/rate-limit/parity [#1 nextlevelbuilder#14] * fix(secure-cli): rate-limit require UserID from context, reject if empty, add HandleRevealEnvForTest [#2] * fix(secure-cli): log decrypt failures in scanRows instead of silent mask [#4] * fix(secure-cli): extend denylist + key-shape regex + deterministic ValidateGrantEnvVars [#6 #7] * fix(migration): 000058 down idempotent + RAISE NOTICE + destructive-drop runbook warning [#5] * fix(ui): clear revealed plaintext on unmount + 30s blur timeout [#10] * fix(ui): clearForm on dialog close not only open — wipe plaintext env on close [#11] * feat(ui): show LIMIT 20 truncation hint + add list.truncated i18n key [#12] * docs(types): JSDoc 3-state env_vars semantics on TS type + Go handler comment [nextlevelbuilder#15] * fix(secure-cli): log rollback-delete errors in handleCreate for ops visibility [nextlevelbuilder#13] * fix(ui): sync frontend denylist with backend additions from finding #6 [nextlevelbuilder#14] * fix(secure-cli): narrow reveal master-scope check to tenant_id only The handler-level rejection used store.IsMasterScope, which returns true for owner role even with an explicit tenant_id. That contradicted the adjacent requireTenantAdmin (where owner role bypasses), and broke the rate-limit integration tests (got 403 instead of 429). Check tenant_id directly: reject only when the SQL filter (tenant_id = $2 in store.Get) would not bind to a real tenant — i.e. uuid.Nil or MasterTenantID. Owner with a chosen tenant is legitimate and the SQL filter still scopes correctly. Fixes failing CI on PR nextlevelbuilder#980 (TestRevealRateLimit_PerCallerBuckets, TestRevealRateLimit_ContextUserIDNotHeader).
…ble callbacks (#2) * feat(webhooks): HTTP webhooks to trigger agents with HMAC auth and durable callbacks Add multi-tenant HTTP webhook endpoints for agent triggering: - /v1/webhooks/message: send messages to channels - /v1/webhooks/llm: sync/async LLM prompts with HMAC-signed callbacks - HMAC-256 + bearer token authentication - Rate limiting and tenant isolation - Durable callback worker with exponential backoff - PG 000056 + SQLite schema v25 migrations - Unit + integration tests, P0 tenant isolation invariants - Channel media capability helpers for attachment routing - Comprehensive webhook documentation and i18n strings * fix(webhooks): address post-review findings (K1-K10) Comprehensive post-merge fixes addressing 10 blocking code review issues and 2 adversarial re-audit findings in webhook-agent-triggering feature: K1: Fix auth middleware tenant context lookup sequencing — move tenant context injection before authenticate() call to prevent unscoped secret lookups. K2: Canonicalize JSON payload format for jsonb compatibility across PostgreSQL and SQLite — ensure consistent serialization without whitespace variance to prevent hash mismatches. K3: Add fail-closed JSON parsing in body hash extraction with explicit error handling for malformed payloads before HMAC verification. K4: Fix worker queue wedge by properly draining slot reservations when delivery succeeds, preventing permanent slot occupancy. K5: Implement lease-token optimistic concurrency control to prevent duplicate webhook delivery under high concurrency or retry storms. K6: Add AES-256-GCM encrypted secret storage at rest with fail-fast skip-mount when GOCLAW_ENCRYPTION_KEY environment variable unset. K7: Implement IP allowlist enforcement supporting both CIDR ranges and exact IP matching with proper X-Forwarded-For parsing. K8: Add HMAC replay nonce cache (5min expiry, non-blocking async flush) to prevent request replay attacks on webhook handler. K9: Fix invariant test schema selection — replace hardcoded assumption with explicit schema name from config to support multi-schema testing. K10: Consolidate rate limiters into single shared instance to prevent per-endpoint limiter starvation and ensure fair rate limiting. New database migrations: - 000057: webhook_calls.lease_token for optimistic concurrency - 000058: webhooks.encrypted_secret_key for AES-256-GCM encryption New i18n keys: MsgWebhookIPDenied, MsgWebhookEncryptionUnavailable (with English, Vietnamese, Chinese translations). New modules: - internal/http/webhooks_payload.go: JSON canonicalization + body hash - internal/http/webhooks_nonce.go: Replay nonce cache implementation - internal/http/webhooks_idempotency_test.go: Integration tests Documentation updates: - docs/webhooks.md: §13-14 security sections, encryption flow - docs/00-architecture-overview.md: webhook subsystem security overview - docs/codebase-summary.md: webhook security patterns - docs/project-changelog.md: webhook fixes changelog Test coverage: 53 webhook tests + 4 P0 invariant tests all passing. No tenant isolation violations. All security gates enforced. * docs(journals): webhook feature ship + fix cycle entries * fix(webhooks): address Claude review findings - webhooks_llm.go: remove misleading ptr() helper; use &completedAt pattern for error-path audit rows (matches success path) - webhooks_auth.go: wrap TouchLastUsed context in WithoutCancel so background DB update isn't cancelled when HTTP response completes - store GetByIDUnscoped (PG+SQLite): add NOT revoked / revoked = 0 filter for defense-in-depth parity with GetByHashUnscoped - webhooks/sign.go: fix package doc — HMAC key is raw plaintext secret bytes, not hex-decoded SHA-256 - webhooks_admin.go: check auth before encKey guard to avoid leaking config state to unauthenticated callers - webhooks_ratelimit.go: two-phase Load→LoadOrStore to avoid per-call entry allocation on the hot path * docs(webhooks): fix Sign() function doc to match actual key input Function-level comment still referenced hex-decoded SecretHash after the package-level doc was corrected. Align with actual caller usage ([]byte(rawSecret)). * fix(webhooks): use WithoutCancel for worker execute DB updates Terminal status writes in execute() ran through the worker main-loop ctx, which is cancelled on graceful shutdown. If the outbound send completed but the status update raced with shutdown, the row stayed in 'running' and got re-delivered via reclaimStale. WithoutCancel lets the DB write survive worker cancellation while preserving propagated values (tenant ID, etc.). * fix(webhooks): move tctx init before panic defer in worker execute Panic recovery called updateRetry with raw ctx (no tenant ID), making requireTenantID fail and the reset-to-retry DB write silently drop. Row stayed 'running' until reclaimStale (~90s delay). Init tctx first so defer closure captures tenant-scoped non-cancellable context. * fix(webhooks): pass tenant-scoped tctx to invokeAgent in worker execute() was passing the raw worker-loop ctx (no tenant ID) to invokeAgent → router.Get → PGAgentStore.GetByID. GetByID reads TenantIDFromContext which returned uuid.Nil, making every lookup return 'agent not found'. Async LLM webhook calls silently failed all retries. Pass tctx (already tenant-scoped + WithoutCancel) so the router resolves the agent correctly. * fix(tests): resolve integration test compile errors - Remove duplicate contains() in mcp_grant_revoke_test.go (already defined in tts_gemini_live_test.go) - Update webhooks_admin_test.go RotateSecret call to match current 5-arg signature (newSecretHash, newPrefix, newEncryptedSecret) * fix(webhooks): default nil scopes/ip_allowlist to empty slice in Create PG columns are NOT NULL DEFAULT '{}'. Explicit NULL from pqStringArray(nil) violated the constraint, breaking TestWebhookAdminCRUD/TenantIsolation. Coerce nil slices to empty []string{} so the default applies at the DB layer. * chore: trigger CI on digitopvn/goclaw fork * ci: retrigger workflows * fix(webhooks): renumber migrations to 000059-000061 for merge train
… audit (#4) * feat(packages): add update flow for GitHub binaries (nextlevelbuilder#900) Closes nextlevelbuilder#900. Proactive update-check + atomic swap for GitHub-installed binaries on the Runtime & Packages page. Interfaces prepared for pip/npm/apk extension in Phase 2. - UpdateCache + UpdateRegistry + PackageLocker (ctx-aware keyed mutex) - GitHubUpdateChecker: ETag-aware, distinct /latest vs /list ETag keys, semver-correct ordering via golang.org/x/mod/semver, non-semver fallback that refuses to downgrade, pre-release + stable candidate fusion for the v1.0.0-rc.1 -> v1.0.0 transition - GitHubUpdateExecutor: two-phase .bak swap with hadBackup-aware rollback, manifest save retry (3x, 100ms/500ms/1s backoff), nil-safe meta access, explicit ScratchDir, 0755 set pre-rename - HTTP: GET /v1/packages/updates (SWR), POST /v1/packages/updates/refresh, POST /v1/packages/update, POST /v1/packages/updates/apply-all (always 200, failed[] is error source). Master-scope gated. - WS events package.update.{checked,started,succeeded,failed} forwarded to owner clients via event_filter.go - Frontend: useUpdates hook + 3 components (summary bar, update-all modal, row button), master-scope-gated disabled state - i18n: 8 backend keys + 17 frontend keys x en/vi/zh - Config: packages.github_token (reserved), updates_check_ttl, scratch_dir - 45+ new tests, race-clean, BenchmarkCheckAll10Packages ~1.1ms/op warm * docs(packages): document update flow + Phase 1 completion - packages-github.md: "Updating Installed Packages" section with UI + API contract, troubleshooting runbook (corrupt cache, rate-limit, scratch dir, mid-swap recovery) - 17-changelog.md + CHANGELOG.md: Phase 1 entry - 14-skills-runtime.md: cross-ref to update flow - journal entry capturing CRIT fixes (double-write, lock-key mismatch, rollback false-alarm) + design wins (keyed locks, red-team pre-flight) * feat(workstation): remote workstation runtime — SSH exec + security + audit Adds generic Remote Workstation Runtime enabling agents to execute commands on user-owned SSH workstations. Includes registry (DB + API + UI), SSH backend with connection pool and circuit breaker, workstation.exec + claude_remote tools, NFKC + binary-name allowlist security, and audit logging. Standard edition only. Closes nextlevelbuilder#941. * fix(workstation): address 3 critical + 5 important code review findings - C1: Add json:"-" to Metadata/DefaultEnv fields; use SanitizedView() in all API responses to prevent SSH private key leakage - C2: Wire CheckEnv into PermCheckFn; LD_PRELOAD/PATH injection now blocked - C3: SSH Setenv fallback — prepend `export K=V;` when server rejects Setenv - I1: BackendCache sync.RWMutex → sync.Mutex (fix data race on lastUsed) - I2: Validate metadata shape in handleUpdate before store write - I3: Include command in exec-done event; activity sink uses actual cmd hash - I4: Wrap pool release in sync.Once (idempotent double-call safety) - I5: Verify workstation tenant ownership before adding permissions * fix(packages): bypass HTTPS+IP validation in update executor tests Test httptest servers bind to http://127.0.0.1 which fails both the HTTPS scheme check and literal-IP SSRF guard. Add testSkipDownloadValidation flag (same pattern as existing withTestDownloadHosts) to skip full URL validation in test context. * fix(workstation): address Claude review findings — tenant isolation + pool leak + dead code - Activity list: add workstation ownership check before listing (prevents cross-tenant activity enumeration via known UUID) - SSH pool: clean up p.sem + p.circuits maps in CloseWorkstation, prune, and Close to prevent unbounded map growth - RPC handlers: return ErrInvalidRequest on JSON unmarshal failure instead of silently using zero-value params - Remove unused containsControlChars function in normalize.go - HTTP tests: add 10s context timeout to prevent CI package timeout * fix(workstation): DefaultEnv JSON parse, backend cache leak, perm ownership check - DefaultEnv: replace KEY=VALUE text parse with json.Unmarshal (stored as JSON by HTTP handler, was silently ignored) - BackendCache: close losing backend on concurrent cache miss to prevent pruneLoop goroutine leak - Backend interface: add Close() error method; SSHBackend delegates to pool.Close() - handlePermList: add wsStore.GetByID ownership check (prevents cross-tenant UUID enumeration returning empty array vs 404) - scanRows: log scan errors instead of silently skipping * fix(workstation): wire activity sink shutdown + remove misleading comment - WireActivitySink: capture cleanup func, register in gateway shutdown (was discarded → retention goroutine leaked + buffered rows lost) - Add Stop() to WorkstationActivityStore interface (PG+SQLite already had it) - wireWorkstationTools returns cleanup func; gateway.go defers it - Remove misleading "re-validate env" comment in allowlist.go Check() * ci: bump unit test timeout from 90s to 120s hooks/handlers package (goja script tests) consumes ~85s on cold CI runners, leaving insufficient headroom for HTTP retry tests with 1s backoff. 120s provides adequate breathing room without masking real deadlocks. * fix: compile errors in integration tests + allowlist docstring - packages_update_test: add missing lockKey arg to registry.Apply - mcp_grant_revoke_test: remove unused fakeMCPClient struct - allowlist.go: fix Check() docstring to match actual 3-step pipeline * fix(test): relax mcp grant revoke assertion for pre-Phase02 state Execute-time grant checking not yet wired — test correctly gets an error but the message is "no active client" (nil clientPtr) rather than "grant revoked". Accept any error as valid regression guard. * chore: trigger CI on digitopvn/goclaw fork * ci: retrigger workflows * fix(permissions): classify workstation methods in RBAC policy
… (#6) * feat(packages): backend pip + npm update flow (nextlevelbuilder#900) Extend Phase 1 update infrastructure to pip + npm sources. Register checkers/executors behind edition gate (Lite edition stays github-only). Per-source sentinel errors + stderr classifier; strict package-name validators reject @Version suffix. Shared PackageLocker serializes install + update paths. HTTP response surfaces per-source availability from LookPath detection. Closes part of nextlevelbuilder#900 (Phase 2a). * feat(packages): frontend multi-source updates UI (nextlevelbuilder#900) Unified flat updates list with source pill (github/pip/npm) + filter dropdown. Summary bar shows per-source counts, hiding sources whose backend availability=false. 30 i18n keys with full en/vi/zh parity. Mobile-safe table (overflow-x-auto + min-w-[600px]). Part of nextlevelbuilder#900 (Phase 2a). * test(packages): pip + npm integration e2e (nextlevelbuilder#900) Optional real-runtime integration test behind `pipnpm_e2e` build tag. Skipped by default CI; exercises full check + apply cycle with real pip3/npm in Alpine container. Part of nextlevelbuilder#900 (Phase 2a). * docs(packages): document pip + npm update flow (nextlevelbuilder#900) Adds packages-pip-npm.md covering command matrix, exit codes, stderr error classes, pre-release handling, availability detection, runbook for EACCES/ERESOLVE/externally-managed, min versions, fixture regen. Cross-link from packages-github.md. Changelogs updated. Part of nextlevelbuilder#900 (Phase 2a). * fix(packages): set exec bit on testdata npm/pip scripts
…extlevelbuilder#900) (#7) * feat(packages): add apk update flow + pkg-helper v2 protocol - APK update checker/executor via helper IPC (runtime detection, upgrade scan via apk list --upgradable) - BREAKING: pkg-helper v2 protocol (5 actions: check_apk/check_pip/check_npm/exec_apk/exec_pip, code/data fields, renewable 10min deadline, apkMutex, 1MB scanner) - Edition gating: SupportsApk + IsAlpineRuntime double-gate (Standard/Full only) - Backend 3-branch wiring: alpine/apt/yum routes + update_registry, dep_installer helpers - i18n: 5 apk keys (EN/VI/ZH catalogs) - Frontend: source pill Alpine badge, APK in updates-list/summary-bar/update-all modal - E2E tests: apk_e2e build tag covering checker/executor/helper protocol - Docs: packages-apk.md, security/changelog updates - Plans + reports under plans/260417-1500-packages-update-phase2b-apk-pkghelper/ + plans/reports/ * docs(packages): journal Phase 2b apk + pkg-helper v2
- enforce binary/grant parent checks on nested grant routes - validate grant binary/agent tenant scope on create - fail closed on invalid per-user env and preserve per-user precedence - remove duplicate CLI Credentials sidebar entry while keeping Packages tab route - refs #12
* fix(agents): handle null JSON config updates * docs(changelog): note agent provider switch fix * docs(journal): record agent provider switch fix
Add explicit per-agent manage grants for skills so granted agents can patch/delete skills when ownership identity drifts. Expose skill owner and manage-grant controls in the web skills UI, and add PostgreSQL/SQLite migrations plus coverage for preserve/revoke behavior.
Retry npm global installs that fail on workspace protocol dependencies by packing the registry tarball, rewriting workspace ranges to published versions, and installing the sanitized package folder.
Avoid npm global symlinks to temporary fallback directories by repacking rewritten workspace dependency packages before install.
Create sanitized npm tarballs directly in Go so workspace dependency fallback does not run package lifecycle scripts or create global symlinks to temporary folders.
Default ChatGPT Subscription (OAuth) provider model selection to GPT-5.5 and update model metadata, tests, and docs.
…s-tenant-scope fix(skills): enforce tenant scope on agent grants
…aller-runtime-bin fix(packages): use runtime dir for GitHub binaries
…-tool feat(tools): add built-in wait tool
…-openrouter-alias fix(secure-cli): resolve runtime npm binary aliases
Adds Skills bulk actions, Grant all agents support, header-level skill version selector, and upload write validation.
* fix(security): harden upstream critical surfaces Refs nextlevelbuilder#30 * fix(security): close pre-landing review gaps Refs nextlevelbuilder#30 * fix(security): close official release blockers
release: promote v3.12.0 official
release: promote v3.12.0
New ENABLE_KUBECTL build arg (gated, off by default) installs pinned kubectl + uv/uvx static musl binaries in the runtime stage. Release workflow flips ENABLE_KUBECTL=true only for the :full variant so :base and :latest stay slim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets one agent run a credentialed CLI with different env per inbound chat (e.g. WhatsApp group). secure_cli_agent_grants gets a nullable chat_id; LookupByBinary resolves the most-specific enabled grant — chat-specific wins, NULL grant is the agent-wide default. Existing grants migrate as chat_id IS NULL → behavior unchanged for current deployments. - Migration 000068 (PG) + SQLite schema v38 with table rebuild to swap (binary_id, agent_id, tenant_id) for (binary_id, agent_id, COALESCE(chat_id,''), tenant_id) uniqueness - LookupByBinary / ListForAgent gain chatID param; PG uses LATERAL with chat-first ordering, SQLite uses correlated scalar subquery - Agent loop propagates req.ChatID into tool ctx via WithToolChatID so channel-driven runs (WhatsApp, Telegram, ...) carry the scope - HTTP grant create/update accepts chat_id with empty=null coercion and 3-state semantics - Web grant form gets an optional Chat ID input + chip on the per-grant card; en/vi/zh locales updated together - 3 new integration tests cover uniqueness coexistence, resolution fallback, and non-global binary blocking Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets admins paste multi-line file contents (kubeconfig YAML, service-account JSON, PEM bundles) directly into the grant env editor instead of mounting files into the container. Convention: env keys prefixed with __FILE_<NAME> carry file content. Validator exempts these from the newline restriction and bumps the size cap to 64KB. At exec time, materializeFileEnvVars writes each value to a 0600 file under a fresh 0700 temp dir, removes the __FILE_ entry, and sets <NAME>=<temp path>. A defer cleans the dir after the child exits. Sandbox exec rejects file env vars (temp files live on the host, not in the container). UI: a new "Add file content" button on the grant env section adds an entry with __FILE_ prefilled and renders the value as a textarea. Backend denylist also rejects __FILE_<DENIED> targets so e.g. __FILE_PATH cannot smuggle a PATH escape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a shared FileDropzone component (textarea inside a drop zone with
file-picker button + size guard) and wires it in two places:
1. Grant env section: file-content entries (__FILE_ key prefix) now
render as a dropzone instead of a plain textarea. Admins can drop
a kubeconfig YAML, pick via file dialog, or paste — same control.
2. Add-credential dialog: preset env vars marked is_file (kubectl's
KUBECONFIG, gcloud's GOOGLE_APPLICATION_CREDENTIALS, ...) render
as a dropzone and are saved with the __FILE_ prefix so the
backend materializes the contents to a temp file at exec time.
Non-file vars still use the masked password input.
i18n keys added to en/vi/zh in the same commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…8s-image feat(secure-cli): per-chat grants + paste-kubeconfig UI + kubectl image
Moonshot's Kimi Coding endpoint requires every request to carry
`User-Agent: claude-code/0.1.0` — without it the upstream rejects the
call. The wire format is otherwise OpenAI-compatible.
Generalises that need via a new WithExtraHeaders option on
OpenAIProvider so other providers can pin static headers without
touching the request path. Headers apply to both the live HTTP request
(openai_http.go doRequest) and the adapter path (adapter_openai.go
ToRequest) so adapter callers see the same shape.
- store: ProviderKimiCoding constant + ValidProviderTypes entry +
KimiCodingDefault{APIBase,Model} + KimiCodingRequiredUserAgent
- providers: extraHeaders field + WithExtraHeaders + ExtraHeaders
getter + wired into doRequest and adapter ToRequest
- runtime: case store.ProviderKimiCoding in the store-based switch
(cmd/gateway_providers.go) and the HTTP-side switch
(internal/http/providers.go) — both inject the required User-Agent
- web UI: kimi_coding dropdown entry with the default API base
pre-filled so admins only need to paste the API key
- tests: 3 new unit tests covering real-request header injection,
adapter-path mirroring, and empty-map no-op
Admin flow:
Providers → Add → "Kimi Coding (Moonshot)" → paste API key → save.
Every outbound request now carries Authorization: Bearer <key> plus
User-Agent: claude-code/0.1.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(providers): add kimi_coding provider with required User-Agent
Revert "feat(providers): add kimi_coding provider with required User-Agent"
…p lock
Moonshot's Kimi Coding endpoint is OpenAI-compatible on the wire but
has two non-standard rules:
1. Every request must carry `User-Agent: claude-code/0.1.0` — without
it the upstream rejects the call outright.
2. `temperature` is locked to the server default; passing any other
value returns HTTP 400 `invalid temperature: only 1 is allowed for
this model`.
Rather than special-case either, this commit generalises both:
- WithExtraHeaders on OpenAIProvider — static headers attached to
every outgoing request. Reusable by any future provider that needs
pinned identity headers; mirrored in adapter_openai.ToRequest so
callers using the adapter path see the same shape.
- The existing skipTemp branch in openai_request.go gets a
provider_type check — kimi_coding joins o1/o3/o4/gpt-5-mini in
omitting `temperature` from the request body.
Provider wiring:
- store.ProviderKimiCoding constant + ValidProviderTypes entry +
KimiCoding{DefaultAPIBase,DefaultModel,RequiredUserAgent}.
- case store.ProviderKimiCoding in both registration switches
(cmd/gateway_providers.go and internal/http/providers.go).
- UI dropdown entry with the API base pre-filled.
5 unit tests cover: real outgoing header injection, adapter-path
header mirroring, empty-map WithExtraHeaders no-op, kimi_coding
strips temperature, and the negative control (other providers still
forward temperature).
Admin flow: Providers → Add → "Kimi Coding (Moonshot)" → paste API
key → save.
2ba2945 to
3b74c4a
Compare
…ool-call Upstream returns HTTP 400 `thinking is enabled but reasoning_content is missing in assistant tool call message at index N` when an assistant message with tool_calls is replayed in history without a reasoning_content field. Kimi has server-side thinking enabled by default for kimi-k2-turbo-preview, so the field is required even when goclaw doesn't have captured reasoning content to send (e.g. the model emitted a tool_call without any thinking, or the stream chunk that carried it was lost). The existing branch already gates on openAIWireAssistantReasoningContent(model) (kimi/deepseek/o-series) and emits the field only when Thinking != "". Extend so kimi_coding also emits an empty string when Thinking is unset — satisfies Kimi's "must be present" check without inventing reasoning content. Other providers in the allowlist keep today's behavior: omit when empty. Three new tests: - kimi_coding always carries reasoning_content on assistant - kimi_coding preserves real Thinking content when set - non-kimi providers (deepseek) do NOT inject empty reasoning_content Reference: NousResearch/hermes-agent plugins/model-providers/kimi-coding documents the same upstream behavior (thinking enabled by default, reasoning_content roundtrip required).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Moonshot's Kimi Coding endpoint as a first-class LLM provider.
The endpoint is OpenAI-compatible on the wire but rejects any request
that doesn't carry the fixed header
User-Agent: claude-code/0.1.0.Rather than special-case it, this PR generalises the need via a small
WithExtraHeadersoption onOpenAIProvider— any future providerthat needs pinned identity headers can reuse the same mechanism without
touching the request path.
Changes
Reusable: static request headers on
OpenAIProviderinternal/providers/openai_config.go— newextraHeaders map[string]stringfield,
WithExtraHeaders(map[string]string)builder,ExtraHeaders()getter. Repeat calls merge; passing an empty map is a no-op.
internal/providers/openai_http.go—doRequestnow writes everyextraHeadersentry to the outgoing request after the standardContent-Type / Authorization / OpenRouter site headers.
internal/providers/adapter_openai.go—ToRequestmirrors the sameheaders into the returned
http.Headerso adapter callers see thesame shape as direct callers.
Provider wiring:
kimi_codinginternal/store/provider_store.go—ProviderKimiCoding = "kimi_coding"constant + entry in
ValidProviderTypes+ companion default constants(
KimiCodingDefaultAPIBase,KimiCodingDefaultModel,KimiCodingRequiredUserAgent).cmd/gateway_providers.goandinternal/http/providers.go— newcase store.ProviderKimiCoding:in both provider-registrationswitches. Both construct an
OpenAIProvideragainst the default base(
https://api.kimi.com/coding/v1when none is supplied) and injectthe required User-Agent via
WithExtraHeaders.ui/web/src/constants/providers.ts— newKimi Coding (Moonshot)dropdown entry with the API base pre-filled.
Tests
internal/providers/openai_extra_headers_test.go— 3 new unit tests:ToRequestpath mirrors the same headers.WithExtraHeaderscalls are a no-op (no accidentalmap allocation).
Admin flow
Every outbound request from that provider now carries
Authorization: Bearer <key>andUser-Agent: claude-code/0.1.0.Migration
None — pure additive change. Existing providers and grants are
unaffected; the new constant is opt-in via the admin UI.
Verification
go build ./...(PG) — cleango build -tags sqliteonly ./...(desktop) — cleango vet ./...— cleaninternal/providers/andinternal/store/packages — green
pnpm tsc --noEmit— no new errorsNotes for reviewers
WithExtraHeadersis intentionally narrow: amap[string]stringseton the provider, applied unconditionally on every request. It does
not support per-request override (callers can already pass headers
through
ChatRequest.Optionsfor that). The split keeps the static"this provider always needs this header" case from leaking into the
hot path.
siteURL/siteTitleOpenRouter pair, but I left those alone to keep the diff focused.